feat: add trace stats computation#51
Conversation
a7773cc to
070ae12
Compare
2056c19 to
d5a3f08
Compare
006cb8e to
4142915
Compare
5a64507 to
3562c93
Compare
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR adds trace stats computation capabilities to the Serverless Compatibility Layer. It enables the system to compute and send trace statistics when tracers haven't already computed them, enhancing stats availability across all supported serverless environments.
Changes:
- Adds a new
StatsConcentratorServicemodule that manages span concentration and stats bucket generation using thelibdd-trace-statslibrary - Adds a new
StatsGeneratormodule that extracts metadata and spans from traces and sends them to the stats concentrator - Extends
ServerlessTraceProcessorto optionally call the stats generator when the tracer hasn't already computed stats - Extends
ServerlessStatsFlusherto flush both tracer-computed and serverless-computed stats - Adds integration tests to verify stats computation and to verify stats are skipped when client has already computed them
- Adds configuration options for service, env, and version fields, and an environment variable to enable/disable stats computation
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/stats_concentrator_service.rs |
New module implementing stats concentration service with metadata tracking and flush logic |
src/stats_generator.rs |
New module for extracting trace information and sending to stats concentrator |
src/trace_processor.rs |
Modified to conditionally invoke stats generator based on client-computed-stats header |
src/stats_flusher.rs |
Modified to flush both channel and concentrator-based stats with proper flow control |
src/config.rs |
Added service, env, version fields and corresponding environment variable reading |
src/main.rs |
Added initialization of stats computation with environment variable toggle |
src/lib.rs |
Exported new modules |
src/mini_agent.rs |
Minor code style improvements (let-else pattern) |
Cargo.toml |
Added thiserror and libdd-trace-stats dependencies |
tests/integration_test.rs |
Added comprehensive integration tests for stats generation scenarios |
tests/common/mock_server.rs |
Updated to handle stats intake with proper 202 status code response |
tests/common/helpers.rs |
Extended send_tcp_request to support additional headers |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac8d049051
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.flush_stats(config.clone(), buffer.to_vec()).await; | ||
| buffer.clear(); | ||
| // copy the batch for this flush | ||
| let channel_stats = buffer.to_vec(); |
There was a problem hiding this comment.
Nit: LLM recommends let channel_stats = std::mem::take(&mut *buffer); to replace let channel_stats = buffer.to_vec(); buffer.clear(); but that's unrelated to the actual intent of the change here, so.
There was a problem hiding this comment.
Applied this change!
| } | ||
| Ok(None) => {} | ||
| Err(e) => { | ||
| error!("Failed to flush stats concentrator: {e}"); |
There was a problem hiding this comment.
Do we want a log in a loop here or a error to be caught and have the whole process restarted/shut down? The trace flusher and stats flusher both have a a Err("<this code> task terminated unexpectedly".into());, would be consistent to have that for the concentrator as well.
There was a problem hiding this comment.
I added a task terminated unexpectedly check for the span concentrator service to be consistent with these other background tasks.
9e4544f to
c2191f8
Compare
df2f714 to
3b74133
Compare
…before asserting the results
3b74133 to
e46f752
Compare
| pub fn new(config: Arc<Config>) -> (Self, StatsConcentratorHandle) { | ||
| let (tx, rx) = mpsc::unbounded_channel(); | ||
| let handle = StatsConcentratorHandle::new(tx); | ||
| // TODO: set span_kinds_stats_computed and peer_tag_keys |
There was a problem hiding this comment.
when will we do this todo?
There was a problem hiding this comment.
I plan to work on it this week!
| // Drain client stats in buffer and stats from concentrator on interval | ||
| _ = interval.tick() => { | ||
| let client_stats = std::mem::take(&mut buffer); | ||
| if !client_stats.is_empty() || self.stats_concentrator.is_some() { |
There was a problem hiding this comment.
why do we check for the stats concentrator to be some here?
There was a problem hiding this comment.
The stats flusher is either flushing trace stats received from the tracer in client_stats or trace stats computed by the agent in stats_concentrator. We want to flush if either one of these cases is true. I added more comments to clarify this.
| tracer_language: tracer_payload.language_name.clone(), | ||
| tracer_version: tracer_payload.tracer_version.clone(), | ||
| hostname: String::new(), | ||
| service_name: None, |
There was a problem hiding this comment.
the payload doesn't have a service name to pull?
There was a problem hiding this comment.
Correct, the TracerPayload does not have a service name to pull.
There was a problem hiding this comment.
While the TracerPayload struct does not have a service name to pull, we can pull the service name from _dd.base_service on the root span.
See #51 (comment)
| } | ||
| } | ||
| } else { | ||
| error!("Unsupported tracer payload version. Failed to send trace stats."); |
There was a problem hiding this comment.
we should probably say which version we saw that we don't know how to handle. i saw a proposal for a new tracer payload version coming down the pike
There was a problem hiding this comment.
Added the unsupported tracer payload version to the error log.
# What does this PR do? Derives PartialEq and Eq on TracerMetadata, enabling equality comparisons between instances using `==` and `!=`. # Motivation Adding these standard library traits enables equality assertions without requiring callers to implement their own field by field comparison. Needed for DataDog/serverless-components#51 (comment) # Additional Notes # How to test the change? Unit tests Co-authored-by: duncan.harvey <duncan.harvey@datadoghq.com>
What does this PR do?
Motivation
Some tracers support trace stats computation, but all don't support it, or have it disabled by default. Trace stats computation in the Serverless Compatibility Layer allows all environments supported by this package to have trace stats.
https://datadoghq.atlassian.net/browse/SVLS-8032
Additional Notes
DD_SERVERLESS_STATS_COMPUTATION_ENABLED=trueclient_drop_p0s: falsein agent/infoendpoint until we sufficiently test tracer computed stats in Serverless environmentsclient_computed: falseto stats intakeserverless-components/crates/datadog-trace-agent/src/mini_agent.rs
Lines 298 to 300 in 9894ef7
Describe how to test/QA your changes
Datadog-Client-Computed-Statsis present in the trace payload